Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates
نویسندگان
چکیده
Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, they are used for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's combined DNA index system (CODIS), we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats-an application of importance in genetic profiling.
منابع مشابه
Gene Clustering Based on Clusterwide Mutual Information
Cluster analysis of gene-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and constructing gene regulatory networks. The motivation for considering mutual information is its capacity to measure a general dependence among gene random variables. We propose a novel clustering strategy based on min...
متن کاملOptimization of sequences in CDMA systems: a statistical-mechanics approach
Statistical mechanics approach is useful not only in analyzing macroscopic system performance of wireless communication systems, but also in discussing design problems of wireless communication systems. In this paper, we discuss a design problem of spreading sequences in code-division multipleaccess (CDMA) systems, as an example demonstrating the usefulness of statistical mechanics approach. We...
متن کاملDependence Maximizing Temporal Alignment via Squared-Loss Mutual Information
The goal of temporal alignment is to establish time correspondence between two sequences, which has many applications in a variety of areas such as speech processing, bioinformatics, computer vision, and computer graphics. In this paper, we propose a novel temporal alignment method called least-squares dynamic time warping (LSDTW). LSDTW finds an alignment that maximizes statistical dependency ...
متن کاملMeasuring Statistical Dependence via the Mutual Information Dimension
We propose to measure statistical dependence between two random variables by the mutual information dimension (MID), and present a scalable parameter-free estimation method for this task. Supported by sound dimension theory, our method gives an effective solution to the problem of detecting interesting relationships of variables in massive data, which is nowadays a heavily studied topic in many...
متن کاملClustering of a Number of Genes Affecting in Milk Production using Information Theory and Mutual Information
Information theory is a branch of mathematics. Information theory is used in genetic and bioinformatics analyses and can be used for many analyses related to the biological structures and sequences. Bio-computational grouping of genes facilitates genetic analysis, sequencing and structural-based analyses. In this study, after retrieving gene and exon DNA sequences affecting milk yield in dairy ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
دوره 2007 شماره
صفحات -
تاریخ انتشار 2007